---
title: "Reference: Azure | Voice"
description: "Documentation for the AzureVoice class, providing text-to-speech and speech-to-text capabilities using Azure Cognitive Services."
---

# Azure

The AzureVoice class in Mastra provides text-to-speech and speech-to-text capabilities using Microsoft Azure Cognitive Services.

## Usage Example
This requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration:


```typescript
import { AzureVoice } from "@mastra/voice-azure";

// Initialize with configuration
const voice = new AzureVoice({
  speechModel: {
    apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var
    region: "eastus", // Or use AZURE_REGION env var
    voiceName: "en-US-AriaNeural", // Optional: specific voice for TTS
  },
  listeningModel: {
    apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var
    region: "eastus", // Or use AZURE_REGION env var
    language: "en-US", // Optional: recognition language for STT
  },
  speaker: "en-US-JennyNeural", // Optional: default voice
});

// Convert text to speech
const audioStream = await voice.speak("Hello, how can I help you?", {
  speaker: "en-US-GuyNeural", // Optional: override default voice
});

// Convert speech to text
const text = await voice.listen(audioStream);
```

## Configuration

### Constructor Options

<PropertiesTable
  content={[
    {
      name: "speechModel",
      type: "AzureSpeechConfig",
      description: "Configuration for text-to-speech synthesis.",
      isOptional: true,
    },
    {
      name: "listeningModel",
      type: "AzureSpeechConfig",
      description: "Configuration for speech-to-text recognition.",
      isOptional: true,
    },
    {
      name: "speaker",
      type: "string",
      description: "Default voice ID for speech synthesis.",
      isOptional: true,
    },
  ]}
/>

### AzureSpeechConfig

Configuration object for speech synthesis (`speechModel`) and recognition (`listeningModel`).

<PropertiesTable
  content={[
    {
      name: "apiKey",
      type: "string",
      description:
        "Azure Speech Services API key (NOT Azure OpenAI key). Falls back to AZURE_API_KEY environment variable.",
      isOptional: true,
    },
    {
      name: "region",
      type: "string",
      description:
        "Azure region (e.g., 'eastus', 'westeurope'). Falls back to AZURE_REGION environment variable.",
      isOptional: true,
    },
    {
      name: "voiceName",
      type: "string",
      description:
        "Voice ID for speech synthesis (e.g., 'en-US-AriaNeural', 'en-US-JennyNeural'). Only used in speechModel. See voice list below.",
      isOptional: true,
    },
    {
      name: "language",
      type: "string",
      description:
        "Recognition language code (e.g., 'en-US', 'fr-FR'). Only used in listeningModel.",
      isOptional: true,
    },
  ]}
/>

## Methods

### speak()

Converts text to speech using Azure's neural text-to-speech service.

<PropertiesTable
  content={[
    {
      name: "input",
      type: "string | NodeJS.ReadableStream",
      description: "Text or text stream to convert to speech.",
      isOptional: false,
    },
    {
      name: "options.speaker",
      type: "string",
      description: "Voice ID to use for speech synthesis (e.g., 'en-US-JennyNeural'). Overrides the default voice.",
      isOptional: true,
      defaultValue: "Constructor's speaker value",
    },
  ]}
/>

Returns: `Promise<NodeJS.ReadableStream>` - Audio stream in WAV format

### listen()

Transcribes audio using Azure's speech-to-text service.

<PropertiesTable
  content={[
    {
      name: "audioStream",
      type: "NodeJS.ReadableStream",
      description: "Audio stream to transcribe. Must be in WAV format.",
      isOptional: false,
    },
  ]}
/>

Returns: `Promise<string>` - The recognized text from the audio

**Note:** Language and recognition settings are configured in the `listeningModel` configuration during initialization, not passed as options to this method.

### getSpeakers()

Returns an array of available voice options (200+ voices), where each node contains:

<PropertiesTable
  content={[
    {
      name: "voiceId",
      type: "string",
      description:
        "Unique identifier for the voice (e.g., 'en-US-JennyNeural', 'fr-FR-DeniseNeural')",
      isOptional: false,
    },
    {
      name: "language",
      type: "string",
      description: "Language code extracted from voice ID (e.g., 'en', 'fr')",
      isOptional: false,
    },
    {
      name: "region",
      type: "string",
      description: "Region code extracted from voice ID (e.g., 'US', 'GB', 'FR')",
      isOptional: false,
    },
  ]}
/>

Returns: `Promise<Array<{ voiceId: string; language: string; region: string; }>>`

## Important Notes

### Azure Speech Services vs Azure OpenAI

**⚠️ Critical:** This package uses **Azure Speech Services**, which is different from **Azure OpenAI Services**.

- **DO NOT** use your `AZURE_OPENAI_API_KEY` for this package
- **DO** use an Azure Speech Services subscription key (obtain from Azure Portal under "Speech Services")
- These are separate Azure resources with different API keys and endpoints

### Environment Variables

API keys and regions can be provided via constructor options or environment variables:

- `AZURE_API_KEY` - Your Azure Speech Services subscription key
- `AZURE_REGION` - Your Azure region (e.g., 'eastus', 'westeurope')

### Voice Capabilities

- Azure offers 200+ neural voices across 50+ languages
- Each voice ID follows the format: `{language}-{region}-{name}Neural` (e.g., 'en-US-JennyNeural')
- Some voices include multilingual support or HD quality variants
- Audio output is in WAV format
- Audio input for recognition must be in WAV format

## Available Voices

Azure provides 200+ neural voices across many languages. Some popular English voices include:

- **US English:**
  - `en-US-AriaNeural` (Female, default)
  - `en-US-JennyNeural` (Female)
  - `en-US-GuyNeural` (Male)
  - `en-US-DavisNeural` (Male)
  - `en-US-AvaNeural` (Female)
  - `en-US-AndrewNeural` (Male)

- **British English:**
  - `en-GB-SoniaNeural` (Female)
  - `en-GB-RyanNeural` (Male)
  - `en-GB-LibbyNeural` (Female)

- **Australian English:**
  - `en-AU-NatashaNeural` (Female)
  - `en-AU-WilliamNeural` (Male)

To get a complete list of all 200+ voices:

```typescript
const voices = await voice.getSpeakers();
console.log(voices); // Array of { voiceId, language, region }
```

For more information, see the [Azure Neural TTS documentation](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts).
